Introduction
Recent technological advances in the field of genomics including DNA microarray and now next-generation sequencing have allowed the analysis of entire genomes. The identification and characterization of the genome-wide locations of transcription factor binding sites and chromatin modifications are critical for the comprehensive understanding of gene regulation under various biological conditions. ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with massively parallel short-read sequencing, offers high specificity, sensitivity, and spatial resolution in profiling in vivo protein-DNA association; histones, histone variants, and modified histones; nucleosome positioning; polymerases and transcriptional machinery complexes; and DNA methylation (Holt and Jones, 2008; Park, 2009).
Although sequencing overcomes certain limitations of DNA-protein profiling with microarrays (ChIP-chip), it raises statistical and computational challenges, some of which are related to those for ChIP-chip and others that are novel. Among other things, the large amount of sequence reads generated by a single machine run and the diverse sources of biases render the analysis of ChIP-seq data challenging. To address these challenges, computational tools have already been proposed by several research groups (e.g., Ji et al., 2008; Jothi et al., 2008; Kharchenko et al., 2008; Zhang et al., 2008b; Rozowsky et al., 2009; Spryrou et al., 2009; Qin et al., 2010). A common first step in the analysis of ChIP-seq data is to smooth the raw sequence read counts along each chromosome to obtain a sequence read profile (aka pile-up) that can be used to identify regions of interest (Pepke et al., 2009).